Methodology for Building Extraction Templates for Russian Language in Knowledge-Based IE Systems
نویسندگان
چکیده
Methodology for Building Extraction Templates for Russian Language in Knowledge-Based IE Systems. Valery Solovyev, Vladimir Ivanov, Rinat Gareev, Sergey Serebryakov, Natalia Vassilieva HP Laboratories HPL-2012-211 event extraction; dictionaries; rules; patterns; meaning-text model; Chomsky grammars In this technical report we describe methodology for building information extraction (IE) rules. Rules are usually developed by experts and are widely used in knowledge-based IE systems. They consist of two parts: the left-hand side (LHS) of a rule is a template that matches a certain syntactico-semantic structure (SSS) and the right-hand side is an action that is executed when LHS template is matched against a particular text fragment. In the report we describe the process of building a more complex LHS part (template). This methodology was used for developing the information extraction system that extracts business events from news articles written in Russian language. External Posting Date: October 6, 2012 [Fulltext] Approved for External Publication Internal Posting Date: October 6, 2012 [Fulltext] Copyright 2012 Hewlett-Packard Development Company, L.P. Methodology for Building Extraction Templates for Russian Language in Knowledge-Based IE Systems Valery Solovyev Kazan Federal University Kazan, Russia [email protected] Vladimir Ivanov Kazan Federal University Kazan, Russia [email protected] Rinat Gareev Kazan Federal University Kazan, Russia [email protected] Sergey Serebryakov Hewlett-Packard Laboratories Saint-Petersburg, Russia [email protected] Natalia Vassilieva Hewlett-Packard Laboratories Saint-Petersburg, Russia [email protected]
منابع مشابه
Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources
Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Even...
متن کاملInformation Extraction for SQL Query Generation in the Conversation-Based Interfaces to Relational Databases (C-BIRD)
This paper presents a novel methodology of incorporating Information Extraction (IE) techniques into an Enhanced Conversation-Based Interface to Relational Databases (C-BIRD) in order to generate dynamic SQL queries. Conversational Agents can converse with the user in natural language about a specific problem domain. In C-BIRD, such agents allow a user to converse with a relational database in ...
متن کاملAutomatic Creation of Domain Templates
Recently, many Natural Language Processing (NLP) applications have improved the quality of their output by using various machine learning techniques to mine Information Extraction (IE) patterns for capturing information from the input text. Currently, to mine IE patterns one should know in advance the type of the information that should be captured by these patterns. In this work we propose a n...
متن کاملUsing Support Vector Machines for Terrorism Information Extraction
Information extraction (IE) is of great importance in many applications including web intelligence, search engines, text understanding, etc.. To extract information from text documents, most IE systems rely on a set of extraction patterns. Each extraction pattern is defined based on the syntactic and/or semantic constraints on the positions of desired entities within natural language sentences....
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012